Emotional transplant in statistical speech synthesis based on emotion additive model
نویسندگان
چکیده
This paper proposes a novel method to transplant emotions to a new speaker in statistical speech synthesis based on an emotion additive model (EAM), which represents the differences between emotional and neutral voices. This method trains EAM using neutral and emotional speech data of multiple speakers and applies it to a neutral voice model of a new speaker (target). There is some degradation in speech quality due to a mismatch in speakers between the EAM and the target neutral voice model. To alleviate the mismatch, we introduce an eigenvoice technique to this framework. We build neutral voice models and EAMs using multiple speakers, and construct an eigenvoice space consisting the neutral voice models and EAMs. To transplant the emotion to the target speaker, the proposed method estimates weights of eigenvoices for the target neutral speech data based on a maximum likelihood criteria. The EAM of the target speaker is obtained by applying the estimated weights to the EAM parameters of the eigenvoice space. Emotional speech is generated using the EAM and the neutral voice model. Experimental results show that the proposed method performs emotional speech synthesis with reasonable emotions and high speech quality.
منابع مشابه
Statistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملThe Function of Pitch Range Variations in Samples of Emotional Expressions in Persian
This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015